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Abstract. We report converging evidence that higher stages of the visual system are critically required 
for the whole to become more than the sum of its parts by studying patient DF with visual agnosia 
using a configural superiority paradigm. We demonstrate a clear dissociation between this patient 
and normal controls such that she could more easily report information about parts, demonstrating 
a striking reversal of the normal configural superiority effect. Furthermore, by comparing DF's 
performance to earlier neuroimaging and novel modeling work, we found a compelling consistency 
between her performance and representations in the early visual areas, which are spared in this 
patient. The reversed pattern of performance in this patient highlights that in some cases visual 
Gestalts do not emerge early on without processing in higher visual areas. More broadly, this study 
demonstrates how neuropsychological patients can be used to unmask representations maintained 
at early stages of processing. 
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1 Main text 

How do wholes become different from the sum of their parts? This classic Gestalt question is elegantly 
brought to experimental life in a simple configural superiority paradigm devised by James Pomerantz 
and colleagues (Pomerantz, Sager, & Stoever, 1977 ). When observers have to indicate which of four 
oriented lines has a different orientation (see Figure 1 ). Pomerantz et al. ( 1977 ) found that adding an 
identical, and thus objectively uninformative, comer to each line led to a pronounced increase in the 
perceptual salience of the resulting shape. The behavioral advantage for this configuration has been 
argued to retiect the role of higher visual areas in Gestalt formation (Kubilius, Wagemans, & Op de 
Beeck, 2011 ). More specifically, Kubilius et al. ( 2011 ) used multi-voxel pattern analysis (MVPA) tech- 
niques in a functional magnetic resonance imaging (fMRI) experiment to demonstrate that patterns of 
activation evoked by this paradigm revealed an advantage for the configural "whole" condition only in 
the higher visual areas lateral occipital (LO) and posterior fusiform (pFs). Early areas, like the primary 
visual cortex (VI), in fact showed the opposite, i.e., a better discrimination in the isolated "part" condi- 
tion. This result potentially demonstrates that configural information exists neither in the stimulus nor 
in its representation in early visual areas — Gestalts only become different from the sum of their parts 
at higher stages of processing. 

The results of Kubilius et al. ( 2011 ) are limited, however, first in the extent that all neuroimaging 
results are limited to an observation of correlation and, second, by an assumption often made when using 
MVPA, namely that the patterns of activation at the level of voxels (averaging over hundreds of thou- 
sands of neurons) can provide a direct measure of the representational content of a given area of the brain. 



L. H. de-Wit and J. Kubilius contributed equally to this work. 



494 



de-Wit L H, Kubilius J, Op de Beeck H P, Wagemans J 



Here we provide direct causal evidence that higher visual areas are required to construct the con- 
figural superiority Gestalt. In particular, we examined perceptual grouping in the visual form agnosia 
patient DF. This patient has been most famously studied in terms of a dissociation in her ability to 
perceive the world versus her ability to guide visual responses (Goodale, Milner, Jakobson, & Carey, 
1991 ). In this study, we do not focus on this patient's "vision for action" but rather on the fact that her 
lesion centers bilaterally on the LO area in the ventral stream (James, CuUiam, Humphrey, Milner, 
& Goodale, 2003 ). Earlier stages of her visual system, including the primary visual cortex, seem to 
be relatively spared (Bridge et al., 2013 ) and support functional behavior, including the guidance of 
action and basic orientation illusions such as the McCuUough effect (Keith, Goodale, & Gemsey, 
1991 ). This patient provides a perfect test case for the configural superiority effect: If higher stages of 
the ventral stream are required for the construction of such seemingly basic Gestalts, then one predicts 
that DF will not reveal the configural superiority advantage observed in normal participants. 

We tested DF with the same paradigm used by Kubilius et al. ( 2011 ) with two modifications that 
were expected to optimize the design for testing this patient and age-matched controls, as explained 
below. In each trial, a display of four elements was presented ( Figure 1 . left, see "parts" and "wholes"). 
Three of these elements were identical, and one was different. In the parts condition, the identical 
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Figure 1. The hypothesized read-out of parts and wholes in the visual hierarchy. Top and middle rows: schematic 
depiction of DF's visual lesion along the ventral stream. Functional MRI scans show that DF lacks mid- and 
higher-level visual areas in the ventral visual pathway particularly LO (James et al., 2003 ). which forms part of 
the lateral occipital complex (LOC). Bottom row: representation of the assumed read-out of the content in early 
visual areas, based on fMRI data (from Kubilius et al., 2011), DF's perfomiance, and a VI model (HMAX layer 
CI; Riesenhuber & Poggio, 1999 ). The performance of age-matched and young controls presumably reflects a 
read-out of representations available in LOC, given the consistency between the fMRI decoding in this area and 
the behavioral advantage in normal adults (figure adapted from DiCarlo & Cox, 2007 : see Kubilius, 2013b ). 



elements were straight lines oriented at a 45° angle, and the odd element was a line oriented at a 135° 
angle. In the whole condition, additional lines forming a right angle (a comer) were added to each of 
the four lines, resulting in three arrows and a triangle. These elements were presented 7° away from the 
central fixation dot. DF was asked to choose which quadrant contained the odd element. 

Unlike in Kubilius et al. ( 2011 ). the stimuli were present on the screen until the patient made a 
response, rather than being briefly flashed. Second, rather than having to press buttons to identify the 
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Figure 2. Discriminability between stimuli in simple VI models: Pixelwise, Gabor Jet, and HMAX CI. GaborJet 
and HMAX layer CI outputs resembled both DF's performance and discriminability of stimuli in the primary 
visual cortex (VI) of young participants tested in Kubihus et al. ( 201 1 ). Pixelwise model exhibited only a minute 
advantage of parts versus wholes (8.87x10'^ vs. 8.54x10"^) due to a large number of identical gray background 
pixels. 

location of the "odd-one-out," the patient responded by pointing to the option she wished to select. 
Note that this was not intended to make the task a "vision for action" task; rather, DF was using 
a simple egocentric action to symbolically provide her response. Her responses were coded by the 
experimenter. Twelve age-matched controls (six males, six females; ages 51-61 years) were tested 
using the same procedure. 

The results revealed a striking dissociation between DF and the control group, both in terms of 
accuracy and correct reaction time (see Figure 1 ). Using a statistic developed by Crawford, Garth- 
waite, and Porter ( 2010 ) to compare the difference score of a single patient to a small control sample, 
we can confirm that DF's results clearly dissociate from the nomal population (two-tailed: p < .001 
for accuracy and p < .00001 for reaction time). While the control participants exhibited a robust con- 
figural superiority advantage (?(11) = 4.07, = .002; see Figure 3 for individual performance), DF 
showed the opposite: she performed better in the isolated line or part condition (two-proportion z-test, 
z = 3.19, p < .0001). DF's performance is in fact strikingly consistent with the discriminability of pat- 
terns of activation measured with fMRJ in the early visual cortex of healthy young participants (Kubil- 
ius et al., 2011 ). DF's reaction times also revealed the same reversal, with much faster performance in 
the isolated line condition, though overall DF's reaction times were much slower than healthy controls 
( Figure 4 ). 

This result is consistent with the hypothesis that early visual cortex is not sufficient to produce 
compelling Gestalt representations. It might appear counterintuitive that DF should not only reveal 
no configural advantage, but also perform better in the isolated line condition than in the configural 
condition. To address this issue further, we compared DF's performance to different models of VI that 
apply Gabor filters at various orientations, spatial frequencies, and scales ( Figure 1 ; Riesenhuber & 
Poggio, 1999 ; Xu, Yue, Lescroart, Biederman, & Kim, 2009; see Figure 2 for comparisons to other 
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Figure 3. Accuracies for all age-matched control participants. Dashed line indicates chance level (25%). 
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Figure 4. Response times (plotted for correct responses only) for patient DF, age-matched controls, and young 
controls revealed a robust interaction (two-tailed p < .0001 using Crawford et al., 2010 ). Moreover, there was a 
highly statistically significant difference between the parts and whole conditions in each group (DF: generalized 
linear model using binomial distribution ?(275) = 5.75, /> < .0001; age-matched controls: two-tailed related 
samples f(l 1) = 9.6'\,p < .0001; young controls: two-tailed related samples f(7) = 20.62,/) < .0001). 

models). Just like DF, these models discriminated better between parts than whole shapes, indicating 
that at an early stage of processing, the additional comer units simply add distracting noise or clutter 
that requires further processing to be usefully organized as a genuine Gestalt. 

Together, our results provide converging neuropsychological and computational evidence that 
higher visual areas are instrumental in the emergence of these configural Gestalts. Thus, while the 
primary visual cortex might show some sensitivity to certain grouping cues (Wannig, Stanisor, & 
Roelfsema, 2011 ). our study highlights the critical importance of higher-level vision in organizing 
visual input such that the whole becomes quantifiably different from its parts. Viewed as a visual 
search task, our results with the "configural superiority effecf ' also suggest that the representational 
differences that are most "salient" for the visual system are not computed in earlier areas, providing 
an important challenge to models that assume that salience is computed by VI (Li, 1999 . 2009 ). More 
broadly, our results highlight how neuropsychological patients can be used to test computational mod- 
els by "unmasking" the representations at earlier stages of processing (Mannan, Kennard, & Husain, 
2009; Ossandon et al, 2012). 

2 Methods 1 

2.1 Participants 

Patient DF, aged 59, participated in the study. Twelve age-matched participants (six females, six males; 
ages 51-61 years) participated in the study. The experiment was approved by the ethical committee of 
the Faculty of Psychology and Educational Sciences at KU Leuven. 

2.2 Software 

The experiment was coded, presented, and analyzed in Python 2.7 (with an exception of compari- 
son of patient DF to control participants, which was analyzed using custom software developed by 
Crawford et al., 2010, accessible at http://homepages.abdn.ac.Uk/j.crawford/pages/dept/SingleCase 
Methodolo gy.htm ) using psychopy_ext extension (Kubilius, 2013a ) for PsychoPy (Peirce, 2009, 
2009), pandas, and matplotlih. Full source code and all collected data are available online at https:// 
bitbucket. org/ qbilius/ df 

2.3 Model simulations J 

We used three simple models of VI: 

(i) Pixelwise, where raw pixel values are used for comparing stimuli; 

(ii) GahorJet (Xu et al., 2009), where a given image is decomposed using eight orientations Gabor 
filters of five spatial scales at 100 image locations, resulting in a 4,000-dimensional output vector; 

(iii) layer CI of the original HMAX model, presumably similar to VI (Riesenhuber & Poggio, 1999 ) 
where four orientations and 12 spatial scales of Gabor filters are applied at each image location 
(layer SI) and pooled over nearby locations and sizes; each model was provided with 256 x 256 
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px images of each configuration observed by participants. Discriminability was computed using 
the dissimilarity measure as proposed by Xu et al. ( 2009 ). Briefly, this measure reflects an abso- 
lute value of a sine of an angle between the (vectorized) outputs of a model. If the two outputs are 
very similar, the angle is small, resulting in a low discriminability. Conversely, if the two outputs 
are dissimilar, the angle is close to 90°, resulting in a high discriminability value of nearly 1. 

2.4 Additional details 

For details about the experimental procedure and young participants (ages 21-37 years) whose behav- 
ioral and fMRI data we used in this study, please refer to an earlier report by Kubilius et al. ( 2011 ). 
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